Transcriptome annotation using tandem SAGE tags
نویسندگان
چکیده
Analysis of several million expressed gene signatures (tags) revealed an increasing number of different sequences, largely exceeding that of annotated genes in mammalian genomes. Serial analysis of gene expression (SAGE) can reveal new Poly(A) RNAs transcribed from previously unrecognized chromosomal regions. However, conventional SAGE tags are too short to identify unambiguously unique sites in large genomes. Here, we design a novel strategy with tags anchored on two different restrictions sites of cDNAs. New transcripts are then tentatively defined by the two SAGE tags in tandem and by the spanning sequence read on the genome between these tagged sites. Having developed a new algorithm to locate these tag-delimited genomic sequences (TDGS), we first validated its capacity to recognize known genes and its ability to reveal new transcripts with two SAGE libraries built in parallel from a single RNA sample. Our algorithm proves fast enough to experiment this strategy at a large scale. We then collected and processed the complete sets of human SAGE tags to predict yet unknown transcripts. A cross-validation with tiling arrays data shows that 47% of these TDGS overlap transcriptional active regions. Our method provides a new and complementary approach for complex transcriptome annotation.
منابع مشابه
Combining SAGE tags to predict genomic transcribed regions
Analysis of several million expressed gene signatures (tags) revealed an increasing number of different sequences, largely exceeding that of annotated genes in mammalian genomes. Serial Analysis of Gene Expression (SAGE) can reveal new RNAs transcribed from previously unrecognized genomic regions. However, conventional SAGE tags are too short to identify unambiguously unique sites in large geno...
متن کاملTags Re-ranking Using Multi-level Features in Automatic Image Annotation
Automatic image annotation is a process in which computer systems automatically assign the textual tags related with visual content to a query image. In most cases, inappropriate tags generated by the users as well as the images without any tags among the challenges available in this field have a negative effect on the query's result. In this paper, a new method is presented for automatic image...
متن کاملAssessment of SAGE in transcript identification.
An essential step in Serial Analysis of Gene Expression (SAGE) is tag mapping, which refers to the unambiguous determination of the gene represented by a SAGE tag. Current resources for tag mapping are incomplete, and thus do not allow assessment of the efficacy of SAGE in transcript identification. A method of tag mapping is described here and applied to the Drosophila melanogaster and Caenorh...
متن کاملA human glomerular SAGE transcriptome database
BACKGROUND To facilitate in the identification of gene products important in regulating renal glomerular structure and function, we have produced an annotated transcriptome database for normal human glomeruli using the SAGE approach. DESCRIPTION The database contains 22,907 unique SAGE tag sequences, with a total tag count of 48,905. For each SAGE tag, the ratio of its frequency in glomeruli ...
متن کاملDeep SAGE analysis of the Caenorhabditis elegans transcriptome
We employed the Tag-seq technique to generate global transcription profiles for different strains and life stages of the nematode C. elegans. Tag-seq generates cDNA tags as does Serial Analysis of Gene Expression (SAGE), but the method yields a much larger number of tags, generating much larger data sets than SAGE. We examined differences in the performance of SAGE and Tag-seq by comparing gene...
متن کامل